9 research outputs found

    Retrieving Ambiguous Sounds Using Perceptual Timbral Attributes in Audio Production Environments

    Get PDF
    For over an decade, one of the well identified problem within audio production environments is the effective retrieval and management of sound libraries. Most of the self-recorded and commercially produced sound libraries are usually well structured in terms of meta-data and textual descriptions and thus allowing traditional text-based retrieval approaches to obtain satisfiable results. However, traditional information retrieval techniques pose limitations in retrieving ambiguous sound collections (ie. sounds with no identifiable origin, foley sounds, synthesized sound effects, abstract sounds) due to the difficulties in textual descriptions and the complex psychoacoustic nature of the sound. Early psychoacoustical studies propose perceptual acoustical qualities as an effective way of describing these category of sounds [1]. In Music Information Retrieval (MIR) studies, this problem were mostly studied and explored in context of content-based audio retrieval. However, we observed that most of the commercial available systems in the market neither integrated advanced content-based sound descriptions nor the visualization and interface design approaches evolved in the last years. Our research was mainly aimed to investigate two things; 1. Development of audio retrieval system incorporating high level timbral features as search parameters. 2. Investigate user-centered approach in integrating these features into audio production pipelines using expert-user studies. In this project, We present an prototype which is similar to traditional sound browsers (list-based browsing) with an added functionality of filtering and ranking sounds by perceptual timbral features such as brightness, depth, roughness and hardness. Our main focus was on the retrieval process by timbral features. Inspiring from the recent focus on user-centered systems ([2], [3]) in the MIR community, in-depth interviews and qualitative evaluation of the system were conducted with expert-user in order to identify the underlying problems. Our studies observed the potential applications of high-level perceptual timbral features in audio production pipelines using a probe system and expert-user studies. We also outlined future guidelines and possible improvements to the system from the outcomes of this research

    MEL: a music entity linking system

    No full text
    Comunicaci贸 presentada a la 18th International Society for Music Information Retrieval Conference (ISMIR 2017), celebrada els dies 23 a 27 d'octubre de 2017 a Suzhou, Xina.In this work, we present MEL, the first Music Entity Linking system. MEL is able to identify mentions of musical entities (e.g., album, songs, and artists) in free text, and disambiguate them to a music knowledge base, i.e., MusicBrainz. MEL combines different state-of-the-art libraries and SimpleBrainz, an RDF knowledge base created from MusicBrainz after a simplification process. MEL is released as a REST API and as an online demo Web.This work was partially funded by the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502)

    MEL: a music entity linking system

    No full text
    Comunicaci贸 presentada a la 18th International Society for Music Information Retrieval Conference (ISMIR 2017), celebrada els dies 23 a 27 d'octubre de 2017 a Suzhou, Xina.In this work, we present MEL, the first Music Entity Linking system. MEL is able to identify mentions of musical entities (e.g., album, songs, and artists) in free text, and disambiguate them to a music knowledge base, i.e., MusicBrainz. MEL combines different state-of-the-art libraries and SimpleBrainz, an RDF knowledge base created from MusicBrainz after a simplification process. MEL is released as a REST API and as an online demo Web.This work was partially funded by the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502)

    Essentia API: a web API for music audio analysis

    No full text
    We present Essentia API, a web API to access a collection of state-of-the-art music audio analysis and description algorithms based on Essentia, an open-source library and machine learning (ML) models for audio and music analysis. We are developing it as part of a broader project in which we explore strategies for the commercial viability of technologies developed at Music Technology Group (MTG) following open science and open source practices, which involves finding licensing schemes and building custom solutions. Currently, the API supports music auto-tagging and classification algorithms (for genre, instrumentation, mood/emotion, danceability, approachability, and engagement), and algorithms for musical key, tempo, loudness, and many more. In the future, we envision expanding it with new machine learning models developed by the MTG and our collaborators to facilitate their access for a broader community of users

    Essentia.js: a JavaScript library for music and audio analysis on the web

    No full text
    Comunicaci贸 presentada a: International Society for Music Information Retrieval Conference celebrat de l'11 al 16 d'octubre de 2020 de manera virtual.Open-source software libraries for audio/music analysis and feature extraction have a significant impact on the development of Audio Signal Processing and Music Information Retrieval (MIR) systems. Despite the abundance of such tools on the native computing platforms, there is a lack of an extensive and easy-to-use reference library for audio feature extraction on the Web. In this paper, we present Essentia.js, an open-source JavaScript (JS) library for audio and music analysis on both web clients and JS-based servers. Along with the Web Audio API, it can be used for efficient and robust real-time audio feature extraction on the web browsers. Essentia.js is modular, lightweight, and easy-to-use, deploy, maintain and integrate into the existing plethora of JS libraries and Web technologies. It is powered by a WebAssembly back-end of the Essentia C++ library, which facilitates a JS interface to a wide range of low-level and high-level audio features. It also provides a higher-level JS API and add-on MIR utility modules along with extensive documentation, usage examples, and tutorials. We benchmark the proposed library on two popular web browsers, Node.js engine, and Android devices, comparing it to the native performance of Essentia and Meyda JS library

    Audio and music analysis on the web using Essentia.js

    No full text
    Open-source software libraries have a significant impact on the development of Audio Signal Processing and Music Information Retrieval (MIR) systems. Despite the abundance of such tools, there is a lack of an extensive and easy-to-use reference library for audio feature extraction on Web clients. In this article, we present Essentia.js, an open-source JavaScript (JS) library for audio and music analysis on both web clients and JS engines. Along with the Web Audio API, it can be used for both offline and real-time audio feature extraction on web browsers. Essentia.js is modular, lightweight, and easy-to-use, deploy, maintain, and integrate into the existing plethora of JS libraries and web technologies. It is powered by a WebAssembly back end cross-compiled from the Essentia C++ library, which facilitates a JS interface to a wide range of low-level and high-level audio features, including signal processing MIR algorithms as well as pre-trained TensorFlow.js machine learning models. It also provides a higher-level JS API and add-on MIR utility modules along with extensive documentation, usage examples, and tutorials. We benchmark the proposed library on two popular web browsers and the Node.js engine, and four devices, including mobile Android and iOS, comparing it to the native performance of Essentia and the Meyda JS library.The work on Essentia.js has been partially funded by the Ministry of Science and Innovation of the Spanish Government under the grant agreement PID2019-111403GB-I00 (Musical AI)

    Da-TACOS: A dataset for cover song identification and understanding

    No full text
    Comunicaci贸 presentada a: 20th annual conference of the International Society for Music Information Retrieval (ISMIR) celebrat del 4 al 8 de novembre de 2019 a Delft, Pa茂sos Baixos.This paper focuses on Cover Song Identification (CSI), an important research challenge in content-based Music Information Retrieval (MIR). Although the task itself is interesting and challenging for both academia and industry scenarios, there are a number of limitations for the advancement of current approaches. We specifically address two of them in the present study. First, the number of publicly available datasets for this task is limited, and there is no publicly available benchmark set that is widely used among researchers for comparative algorithm evaluation. Second, most of the algorithms are not publicly shared and reproducible, limiting the comparison of approaches. To overcome these limitations we propose Da-TACOS, a DaTAset for COver Song Identification and Understanding, and two frameworks for feature extraction and benchmarking to facilitate reproducibility. Da-TACOS contains 25K songs represented by unique editorial metadata plus 9 low- and mid-level features pre-computed with open source libraries, and is divided into two subsets. The Cover Analysis subset contains audio features (e.g. key, tempo) that can serve to study how musical characteristics vary for cover songs. The Benchmark subset contains the set of features that have been frequently used in CSI research, e.g. chroma, MFCC, beat onsets etc. Moreover, we provide initial benchmarking results of a selected number of state-of-the-art CSI algorithms using our dataset, and for reproducibility, we share a GitHub repository containing the feature extraction and benchmarking frameworks.This work is partially supported by the MIP-Frontiers project, the European Union鈥檚 Horizon 2020 research and innovation programme under the Marie Sk艂odowska-Curie grant agreement No. 765068, and by TROMPA, the Horizon 2020 project 770376-2

    Da-TACOS: A dataset for cover song identification and understanding

    No full text
    Comunicaci贸 presentada a: 20th annual conference of the International Society for Music Information Retrieval (ISMIR) celebrat del 4 al 8 de novembre de 2019 a Delft, Pa茂sos Baixos.This paper focuses on Cover Song Identification (CSI), an important research challenge in content-based Music Information Retrieval (MIR). Although the task itself is interesting and challenging for both academia and industry scenarios, there are a number of limitations for the advancement of current approaches. We specifically address two of them in the present study. First, the number of publicly available datasets for this task is limited, and there is no publicly available benchmark set that is widely used among researchers for comparative algorithm evaluation. Second, most of the algorithms are not publicly shared and reproducible, limiting the comparison of approaches. To overcome these limitations we propose Da-TACOS, a DaTAset for COver Song Identification and Understanding, and two frameworks for feature extraction and benchmarking to facilitate reproducibility. Da-TACOS contains 25K songs represented by unique editorial metadata plus 9 low- and mid-level features pre-computed with open source libraries, and is divided into two subsets. The Cover Analysis subset contains audio features (e.g. key, tempo) that can serve to study how musical characteristics vary for cover songs. The Benchmark subset contains the set of features that have been frequently used in CSI research, e.g. chroma, MFCC, beat onsets etc. Moreover, we provide initial benchmarking results of a selected number of state-of-the-art CSI algorithms using our dataset, and for reproducibility, we share a GitHub repository containing the feature extraction and benchmarking frameworks.This work is partially supported by the MIP-Frontiers project, the European Union鈥檚 Horizon 2020 research and innovation programme under the Marie Sk艂odowska-Curie grant agreement No. 765068, and by TROMPA, the Horizon 2020 project 770376-2
    corecore